Local feature fusion network-based few-shot image classification

نویسندگان

چکیده

目的小样本学习是一项具有挑战性的任务，旨在利用有限数量的标注样本数据对新的类别数据进行分类。基于度量的元学习方法是当前小样本分类的主流方法，但往往仅使用图像的全局特征，且模型分类效果很大程度上依赖于特征提取网络的性能。为了能够充分利用图像的局部特征以及提高模型的泛化能力，提出一种基于局部特征融合的小样本分类方法。方法首先，将输入图像进行多尺度网格分块处理后送入特征提取网络以获得局部特征；其次，设计了一个基于Transformer架构的局部特征融合模块来得到包含全局信息的局部增强特征，以提高模型的泛化能力；最后，以欧几里得距离为度量，计算查询集样本特征向量与支持集类原型之间的距离，实现分类。结果在小样本分类中常用的3个数据集上与当前先进的方法进行比较，在5-way 1-shot和5-way 5-shot的设置下相对次优结果，所提方法在MiniImageNet数据集上的分类精度分别提高了2.96%和2.9%，在CUB （Caltech-UCSDBirds-200-2011）数据集上的分类精度分别提高了3.22%和1.77%，而在TieredImageNet数据集上的分类精度与最优结果相当，实验结果表明了所提方法的有效性。结论提出的小样本分类方法充分利用了图像的局部特征，同时改善了模型的特征提取能力和泛化能力，使小样本分类结果更为准确。;Objective The emerging convolutional neural network based（CNN-based）deep learning technique is beneficial for image context like its recognition，detection，segmentation and other related fields nowadays. However，the ability of CNN often challenged a large number labeled samples. model disturbed over-fitting problem due to insufficient sample some categories. collection task samples time-consuming costly. However，human-related perception has learn from small For example，it will be easily recognized new images in these categories even under few pictures each category circumstances. To make have the similar human，a machine algorithm concerned about more，called few-shot learning. Few-shot can used classify terms limited amount annotation data. Current metric-based meta-learning methods as one effective fewshot methods. However，it implemented on basis global features，which cannot represent structure adequately. More local feature information required involved well，which provide discriminative transferable across Furthermore，there are features representation-derived metric obtain pixel-level deep descriptors representation via removing last average pooling layer CNN. However，local depth but classification effect restricted by sacrificed contextual image. Additionally，for extraction network，due instances，it good generalize utilize improve generalization model，we develop method fusion. Method First，to features，the input divided into H×W blocks then transferred network. This representation-related demonstrate information. Multi-scale grid illustrated well. Second，to fuse relationship between multiple representations，we design Transformer architecture based fusion module because self-attention mechanism capture sequences effectively. Each consists it fusion-after simultaneous And，we concatenate representations final output. original enhanced improved after that. Finally，the Euclidean distance query embedding support class prototype calculated Our training process two steps：pre-training meta-training. pre-training stage，the Sofamax layer-attached backbone all set. use data-augmented random cropping，horizontal flipping color jittering. After pre-training，the initialized with pre-trained weights，and components fine-tuned. episode strategy training. fair comparison methods，the ResNet12 extractor network，and cross entropy loss optimized through stochastic gradient descent（SGD）. initial rate set 5×10-4，and we 100 epochs total，the decreased half every 10 epochs，100 episodes 600 validate epoch. domain difference larger since there more TieredImageNet dataset，more iteration convergent. Therefore，we 200 epochs，and very 20 epochs. In test stage，to evaluate accuracy，such 5 000 selected randomly. Result Comparative analysis three benchmark datasets classification. MiniImageNet dataset，each accuracy 2. 96% 9% 5-way 1-shot 5-shot settings. CUB increased 3. 22% 1. 77%. dataset，the proposed equivalent state-of-the-art accuracy. fully verify effectiveness method，a ablation experiments also carried out Conclusion We fusionbased It sufficient enhance further，which embedded potentially.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Based Image Fusion

The paper proposes a method for fusion of registered multi focus images using various characteristic properties of the images as attributes for fusion. Experiment is conducted on a large set of images and the results of image fusion using different image attributes are analyzed using quality assessment algorithms to estimate how well information contained in the source images are represented in...

متن کامل

Few-shot Classification by Learning Disentangled Representations

Machine learning has improved state-of-the art performance in numerous domains, by using large amounts of data. In reality, labelled data is often not available for the task of interest. A fundamental problem of artificial intelligence is finding a representation that can generalize to never seen before classes. In this research, the power of generative models is combined with disentangled repr...

متن کامل

Image Feature Classification Based on Particle Swarm Optimization Neural Network

Image feature classification is one of the basic questions of image processing and computer vision and it is also a key step of image analysis. BP neural network has been extensively applied in feature classification and it can classify specific objects or features through early learning; however, BP algorithm also has many defects, including slow convergence speed and easiness to be trapped in...

متن کامل

FSSD: Feature Fusion Single Shot Multibox Detector

SSD (Single Shot Multibox Detetor) is one of the best object detection algorithms with both high accuracy and fast speed. However, SSD’s feature pyramid detection method makes it hard to fuse the features from different scales. In this paper, we proposed FSSD (Feature Fusion Single Shot Multibox Detector), an enhanced SSD with a novel and lightweight feature fusion module which can improve the ...

متن کامل

Multiscale High-Level Feature Fusion for Histopathological Image Classification

Histopathological image classification is one of the most important steps for disease diagnosis. We proposed a method for multiclass histopathological image classification based on deep convolutional neural network referred to as coding network. It can gain better representation for the histopathological image than only using coding network. The main process is that training a deep convolutiona...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Image and Graphics

سال: 2023

ISSN: ['1006-8961']

DOI: https://doi.org/10.11834/jig.220079